Usage of Dataset class

We here show the API of the Dataset class, conceived to run and collects ensemble calculations.

[1]:
from BigDFT import Datasets as D, Calculators as C, Inputfiles as I
import numpy as np
import matplotlib.pyplot as plt

As first explanatory example, we present the analysis of the convergence of the Ground State (GS) energy of a \(CO\) with respect to the size of the simulation domain.

Analysis of dataset for the convergence of the GS of the CO molecule

We instantiate an object of the system calculator class which specifies the number of threads and mpi processor used for the computation, we also instantiate the Dataset object that manages the ensemble of calculation needed for the convergence analysis.

[2]:
code=C.SystemCalculator(omp=2,mpi_run='mpirun -np 4',skip=True)
study = D.Dataset(label='CO_GS',run_dir='CO_GS',posinp='CO_posinp.xyz',molecule_shape = 'linear')
Initialize a Calculator with OMP_NUM_THREADS=2 and command mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft

study is characterized by its label ‘CO_GS’ and contains the path run_dir in which all the calculations of the dataset will be performed and the name of the posinp of the molecule. The variable molecule_shape has been introduced to show how further information can be passed to the dataset instances. This information can be extracted as follows

[3]:
print study.global_options()
print study.get_global_option('molecule_shape')
{'molecule_shape': 'linear', 'run_dir': 'CO_GS', 'posinp': 'CO_posinp.xyz', 'label': 'CO_GS'}
linear

Global options can be also added and removed as follows

[4]:
study.update_global_options(new_option = 'value')
print study.global_options()
{'molecule_shape': 'linear', 'run_dir': 'CO_GS', 'posinp': 'CO_posinp.xyz', 'new_option': 'value', 'label': 'CO_GS'}
[5]:
# remove the new_option key
study.pop_global_option('new_option')
study.global_options()
[5]:
{'label': 'CO_GS',
 'molecule_shape': 'linear',
 'posinp': 'CO_posinp.xyz',
 'run_dir': 'CO_GS'}
[6]:
# Define the default parameters of the input file
inp = I.Inputfile()
inp.set_hgrid(0.37)
inp.set_wavefunction_convergence(gnrm=1.0e-5)
inp
[6]:
{'dft': {'gnrm_cv': 1e-05, 'hgrids': 0.37}}

We introduce a list of values of rmult. The convergence analysis will be performed on this parameter.

[7]:
rmult = [5.0,6.0,7.0,8.0]

We prepare the study for the convergence analysis by appending the run associated to the values of rmult. Each run is characterized by an id and contains the InputFile object as input as well as the code used to perform the calculation. If the same id is provided more than once, the function gives a Value Error.

[8]:
for r in rmult:
    inp.set_rmult(coarse=r,fine=9.0)
    study.append_run(id={'rmult':r},runner=code,input=inp)

The following member of the Dataset class shows how to refer to the various calculation of study.

[9]:
print study.calculators
print study.ids
[{'runs': [0, 1, 2, 3], 'calc': <BigDFT.Calculators.SystemCalculator instance at 0x7fa258b38518>}]
[{'rmult': 5.0}, {'rmult': 6.0}, {'rmult': 7.0}, {'rmult': 8.0}]
[10]:
# Extract the study id and runs parameters for each value of the runs key in the study.calculator.
# Shows just one for shortness
i = 2
print study.ids[i]
print study.runs[i]
{'rmult': 7.0}
{'molecule_shape': 'linear', 'run_dir': 'CO_GS', 'posinp': 'CO_posinp.xyz', 'input': {'dft': {'rmult': [7.0, 9.0], 'hgrids': 0.37, 'gnrm_cv': 1e-05}}, 'label': 'CO_GS'}

study.names contains a list of strings with the ids of the run. May be useful for labelling.

[11]:
study.names
[11]:
['rmult:5.0', 'rmult:6.0', 'rmult:7.0', 'rmult:8.0']

The appended simulation can be executed with the run method. The skip=True in the code instance guarantees that computation already performed are not executed again.

[12]:
study.run()
Copy the posinp file 'CO_posinp.xyz' into 'CO_GS'
Creating the yaml input file "CO_GS/rmult:5.0.yaml"
Run directory CO_GS
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n rmult:5.0 -s Yes
Copy the posinp file 'CO_posinp.xyz' into 'CO_GS'
Creating the yaml input file "CO_GS/rmult:6.0.yaml"
Run directory CO_GS
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n rmult:6.0 -s Yes
Copy the posinp file 'CO_posinp.xyz' into 'CO_GS'
Creating the yaml input file "CO_GS/rmult:7.0.yaml"
Run directory CO_GS
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n rmult:7.0 -s Yes
Copy the posinp file 'CO_posinp.xyz' into 'CO_GS'
Creating the yaml input file "CO_GS/rmult:8.0.yaml"
Run directory CO_GS
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n rmult:8.0 -s Yes
[12]:
{0: <BigDFT.Logfiles.Logfile instance at 0x7fa28fb9c680>,
 1: <BigDFT.Logfiles.Logfile instance at 0x7fa2587f87a0>,
 2: <BigDFT.Logfiles.Logfile instance at 0x7fa25875e4d0>,
 3: <BigDFT.Logfiles.Logfile instance at 0x7fa2586885f0>}

The class member study.results contains a dictionary with the logfiles of the computations performed by the run method. The cardinal key respects the order of the append_run.

[13]:
results = study.results
results
[13]:
{0: <BigDFT.Logfiles.Logfile instance at 0x7fa28fb9c680>,
 1: <BigDFT.Logfiles.Logfile instance at 0x7fa2587f87a0>,
 2: <BigDFT.Logfiles.Logfile instance at 0x7fa25875e4d0>,
 3: <BigDFT.Logfiles.Logfile instance at 0x7fa2586885f0>}

Several operation can be performed on the results by using the methods of the logfile class. Here we provide some examples.

Perform a convergence plot of the total energy w.r.t. the domain size

[34]:
for calc in study.calculators[0]['runs']:
    print study.ids[calc]['rmult'], results[calc].energy, study.names[calc]
    plt.scatter(study.ids[calc]['rmult'],results[calc].energy,label=study.names[calc])
plt.legend()
plt.title('Energy',size=12)
plt.ylim(-21.661,-21.662)
plt.show()
5.0 -21.661153985 rmult:5.0
6.0 -21.6613458421 rmult:6.0
7.0 -21.6613579336 rmult:7.0
8.0 -21.6613594644 rmult:8.0
../_images/tutorials_Datasets_27_1.png

Other useful quantities can be extract using the methods of the Logfile class, for instance

[15]:
print results[0].dipole
print results[0].evals[0][0]
print results[0].log['dft']['rmult']
print results[0].log['Electric Dipole Moment (AU)']
[-0.0002296574, -0.0002296574, 0.09288163]
[-1.07897861 -0.5212773  -0.44461433 -0.44461428 -0.33423529]
[5.0, 9.0]
{'norm(P)': 0.0928822004, 'P vector': [-0.0002296574, -0.0002296574, 0.09288163]}
[16]:
dos = results[0].get_dos(label=study.names[0])
dos.append_from_bandarray(results[2].evals,label=study.names[2])
dos.plot(sigma = 0.2)
../_images/tutorials_Datasets_30_0.png

Post processing analysis: fetch_results and post-processing function

Results associated to specific id and/or attribute can be extracted as follows:

[17]:
study.fetch_results()
[17]:
[<BigDFT.Logfiles.Logfile instance at 0x7fa28fb9c680>,
 <BigDFT.Logfiles.Logfile instance at 0x7fa2587f87a0>,
 <BigDFT.Logfiles.Logfile instance at 0x7fa25875e4d0>,
 <BigDFT.Logfiles.Logfile instance at 0x7fa2586885f0>]

If an attribute and/or an id is provided, fetch_results extracts only the wanted results, for instance

[18]:
print  study.fetch_results(attribute='energy')
print  study.fetch_results({'rmult' : 8},attribute='energy')
[-21.66115398501919, -21.661345842050096, -21.661357933587652, -21.661359464434042]
[-21.661359464434042]

It is also possible to define a post-processing function that is passed to the Dataset instance. This function is called after the run and the output of study.run() and contains the output of the post-processing function. We show some examples.

Extraction of the ground state dipole

[19]:
def get_dipole(dataset):
    dipole = dataset.fetch_results(attribute='dipole')
    return dipole
[20]:
study.set_postprocessing_function(get_dipole)
[21]:
dipole_GS = study.run()
dipole_GS
Copy the posinp file 'CO_posinp.xyz' into 'CO_GS'
Creating the yaml input file "CO_GS/rmult:5.0.yaml"
Run directory CO_GS
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n rmult:5.0 -s Yes
Copy the posinp file 'CO_posinp.xyz' into 'CO_GS'
Creating the yaml input file "CO_GS/rmult:6.0.yaml"
Run directory CO_GS
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n rmult:6.0 -s Yes
Copy the posinp file 'CO_posinp.xyz' into 'CO_GS'
Creating the yaml input file "CO_GS/rmult:7.0.yaml"
Run directory CO_GS
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n rmult:7.0 -s Yes
Copy the posinp file 'CO_posinp.xyz' into 'CO_GS'
Creating the yaml input file "CO_GS/rmult:8.0.yaml"
Run directory CO_GS
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n rmult:8.0 -s Yes
[21]:
[[-0.0002296574, -0.0002296574, 0.09288163],
 [-3.391146e-05, -3.391146e-05, 0.09150228],
 [-2.716707e-06, -2.716707e-06, 0.09138863],
 [-1.027985e-05, -1.027985e-05, 0.09137423]]

Extraction of the box size

[22]:
def get_size(study): # how to do def get_size(study,unit)
    sizes = []
    results = study.fetch_results()
    for calc,res in enumerate(results):
        sizes.append(res.log['Sizes of the simulation domain']['AU'])
    return sizes
[23]:
study.set_postprocessing_function(get_size)
[24]:
boxsize = study.run()
boxsize
Copy the posinp file 'CO_posinp.xyz' into 'CO_GS'
Creating the yaml input file "CO_GS/rmult:5.0.yaml"
Run directory CO_GS
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n rmult:5.0 -s Yes
Copy the posinp file 'CO_posinp.xyz' into 'CO_GS'
Creating the yaml input file "CO_GS/rmult:6.0.yaml"
Run directory CO_GS
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n rmult:6.0 -s Yes
Copy the posinp file 'CO_posinp.xyz' into 'CO_GS'
Creating the yaml input file "CO_GS/rmult:7.0.yaml"
Run directory CO_GS
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n rmult:7.0 -s Yes
Copy the posinp file 'CO_posinp.xyz' into 'CO_GS'
Creating the yaml input file "CO_GS/rmult:8.0.yaml"
Run directory CO_GS
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n rmult:8.0 -s Yes
[24]:
[[15.54, 15.54, 15.91],
 [18.87, 18.87, 18.87],
 [21.83, 21.83, 21.83],
 [25.16, 25.16, 25.16]]

Usage of Dataset for the computation of the statical polarizability tensor \(\alpha\)

We choose a single value of rmult, among the ones computed in the gs analysis, and compute the statical polarizability tensor of the \(CO\) molecule.

[25]:
ind = 2
print 'rmult value : ', study.ids[ind]['rmult']
rmult value :  7.0
[26]:
# The ground state dipole can be extracted in several ways from the previous analysis
print dipole_GS[ind]
print study.fetch_results(study.ids[ind],attribute='dipole')
[-2.716707e-06, -2.716707e-06, 0.09138863]
[[-2.716707e-06, -2.716707e-06, 0.09138863]]
[27]:
# set the input file for the statical polarizability computation
inp.set_rmult(coarse = study.ids[ind]['rmult'], fine = 9.0)
inp
[27]:
{'dft': {'gnrm_cv': 1e-05, 'hgrids': 0.37, 'rmult': [7.0, 9.0]}}

Define the dataset (called ef, i.e. electric field) for the computation of \(\alpha\) and append the relevant runs

[28]:
intensity=1.e-2 #assume that this value is compatible with the linear response regime
ef=D.Dataset(label='alpha',run_dir='CO_alpha',input=inp,posinp='CO_posinp.xyz',d0=dipole_GS[ind],F=intensity)
for idir,coord in enumerate(['x','y','z']):
    el=np.zeros(3)
    el[idir]=intensity
    inp.apply_electric_field(el.tolist())
    ef.append_run({'id': coord,'F':intensity},code,input=inp)
[29]:
print ef.ids
[{'id': 'x', 'F': 0.01}, {'id': 'y', 'F': 0.01}, {'id': 'z', 'F': 0.01}]
[30]:
def extract_alpha(ef):
    """
    alpha_ij isthe i-th component of the vector d-d0, computed with a field in the j-th direction,
    divided for the intensity of the field.
    """
    d0=np.array(ef.get_global_option('d0'))
    F=ef.get_global_option('F')
    d=ef.fetch_results(attribute='dipole')
    alpha=np.mat(np.zeros(9)).reshape(3,3)
    for idir in range(3):
        alpha[idir]=(np.array(d[idir])-d0)/F
    return alpha
[31]:
ef.set_postprocessing_function(extract_alpha)
[32]:
alpha = ef.run()
alpha
Create the sub-directory 'CO_alpha'
Copy the posinp file 'CO_posinp.xyz' into 'CO_alpha'
Creating the yaml input file "CO_alpha/F:0.01,id:x.yaml"
Run directory CO_alpha
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n F:0.01,id:x -s Yes
Copy the posinp file 'CO_posinp.xyz' into 'CO_alpha'
Creating the yaml input file "CO_alpha/F:0.01,id:y.yaml"
Run directory CO_alpha
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n F:0.01,id:y -s Yes
Copy the posinp file 'CO_posinp.xyz' into 'CO_alpha'
Creating the yaml input file "CO_alpha/F:0.01,id:z.yaml"
Run directory CO_alpha
Executing command:  mpirun -np 4 /local/deutsch/Forge/BigDFT/build-mpif90/install/bin/bigdft -n F:0.01,id:z -s Yes
[32]:
matrix([[ 1.26637417e+01, -2.31315400e-04,  4.23620000e-02],
        [-2.31315400e-04,  1.26637417e+01,  4.23620000e-02],
        [ 4.21875400e-04,  4.21875400e-04,  1.60751070e+01]])

A list of posinp for several molecules can be found as follows

[33]:
from BigDFT.Database import Molecules
CO=Molecules.Molecule('CO')
print CO
{'units': 'angstroem', 'positions': [{'C': [0.0, 0.0, 0.0], 'sym': 'C'}, {'sym': 'O', 'O': [0.0, 0.0, 1.1282]}], 'global monopole': 0.0}

so this analysis can be easily performed for other cases.

[ ]: